Early mutational signatures and transmissibility of SARS-CoV-2 Gamma and Lambda variants in Chile

Genomic surveillance (GS) programmes were crucial in identifying and quantifying the mutating patterns of SARS-CoV-2 during the COVID-19 pandemic. In this work, we develop a Bayesian framework to quantify the relative transmissibility of different variants tailored for regions with limited GS. We use it to study the relative transmissibility of SARS-CoV-2 variants in Chile. Among the 3443 SARS-CoV-2 genomes collected between January and June 2021, where sampling was designed to be representative, the Gamma (P.1), Lambda (C.37), Alpha (B.1.1.7), B.1.1.348, and B.1.1 lineages were predominant. We found that Lambda and Gamma variants’ reproduction numbers were 5% (95% CI: [1%, 14%]) and 16% (95% CI: [11%, 21%]) larger than Alpha’s, respectively. Besides, we observed a systematic mutation enrichment in the Spike gene for all circulating variants, which strongly correlated with variants’ transmissibility during the studied period (r = 0.93, p-value = 0.025). We also characterised the mutational signatures of local samples and their evolution over time and with the progress of vaccination, comparing them with those of samples collected in other regions worldwide. Altogether, our work provides a reliable method for quantifying variant transmissibility under subsampling and emphasises the importance of continuous genomic surveillance.


Methods overview
We studied surveillance data of 3443 samples collected between January and June 2021 from different Chilean regions in hospitals belonging to the Chilean influenza surveillance network.All samples must have tested positive in an RT-qPCR SARS-CoV-2 test with a Ct value lower than 25 and were sent to the Chilean Public Health Institute (ISP) in Santiago for sequencing under a strict cold transportation chain.Whole SARS-CoV-2 genome sequences were obtained using a MiSeq (Illumina) platform with a 300-cycle (total) reagent kit.We assessed sequencing quality with the FastQC program, v0.11.8, and used the IRMA (v0.9.3) and MAFFT (v7.458) software to assemble and align the genomes respectively 45,46 .The lineage to which each sample belongs was determined using Pangolin v3.1.5 47.Then, we defined the most prevalent lineages through the frequency of observation per epidemiological week, thus selecting the lineages with a frequency equal to or greater than 20%.We limit our analysis window to samples collected from January and June 2021, as after this point, the representativeness of the sampling protocol for GS was compromised; samples suspected to belong to the Delta lineage (B.1.617.2) were prioritised for sequencing to achieve other public health goals 48,49 .
The Chilean Ministry of Health coordinates the national influenza/SARS-CoV-2 surveillance network of hospitals and care centres, thus having the responsibility of collecting, selecting, and choosing which samples need to be sequenced by the ISP.Once these samples are sequenced, they are promptly shared with the international public repository GISAID, making them available for this and other scientific studies.Therefore, given that the data we used for this study was of an open domain, no patient consent was required.
Our Bayesian model simulates the spread of each variant separately using a discrete renewal process [50][51][52][53] .In our model, COVID-19 spreads with an inferred time-dependent effective reproduction number R eff,t 54 , where the contribution of each variant to R eff,t is modulated by a time-invariant factor f variant .This factor accounts for the relative transmissibility of the variants.We use the Alpha variant as reference (i.e., f Alpha = 1 ), as its transmissibility has been accurately quantified in settings where it was the sole circulating variant 55 ; Knowing the base reproduction number of Alpha enables the estimation of other variants' base reproduction numbers by multiplying it by the corresponding factor f. Our model also included a small random influx of variants from abroad, so new variants appear in our system by importation.This influx is thus essential to explain the sudden emergence of new variants among sequenced samples.The above implies deviations from an ideal sampling (binomial distribution).Consequently, we also incorporate a correction factor that penalises non-ideal measurements with more significant errors than expected under binomial sampling (see "Methods").
We used two data sources to infer the variants' relative transmissibility and other parameters in our model: (1) the weekly averaged variant share (i.e., the fraction a given variant represents of the total samples) to constrain our model, assuming that these observations follow a multinomial distribution.(2) The daily number of (largely non-sequenced) observed new cases to infer the absolute prevalence of the variants in time.Our method differs from the phylodynamic inference of population growth rates as implemented in BEAST 2 56,57 in that it does not build phylogenetic trees, but only groups the different variants together, which substantially simplifies the inference when data is scarce.
To characterise the genetic diversity of SARS-CoV-2 variants circulated in Chile, we selected 2650 complete sequences of SARS-CoV-2 (over 29,000 base pairs) and built a phylogenetic tree.Then, we compared those sequences selected to SARS-CoV-2 reference sequences, assigned them to clades, and determined their position within the reference phylogenetic tree using the Nextclade Web tool 58 .We use the normalised Total Mutational Load (nTML) 59 as a proxy for mutation enrichment of different parts of the genome (see "Methods").

Transmissibility of most predominant SARS-CoV-2 variants in Chile
Among the samples collected between January and June 2021 (n=3443) we identified 86 different SARS-CoV-2 lineages.However, after filtering the dataset to consider only those that represent at least 20% of the total samples during one weekly observation period (n=2920) we identify five predominant variants: Gamma (labelled as Variant of Concern, VoC), Lambda (labelled as Variant of Interest, VoI), Alpha (VoC), B.1.1.348,and B.1.1 (see Fig. 1a).The Gamma VoC, first reported in November 2020 in Manaus, Brazil 60 , was the dominant variant in Chile from May 2021 on, counting 1614 samples in the period analysed.It was followed by the Lambda VoI, with 838 samples.The Alpha VoC, to date reported in 183 countries around the world 61 , was detected only 158 times in Chile.In addition to those VoCs and VoI mentioned before, we have identified 252 samples classified as B.1.1.348and 58 as B.1.1.
The Bayesian model fitted the daily number of cases well (Fig. 1b) by adapting the effective reproduction number (Fig. 1c) and also modelling the share of the different variants over time (Fig. 1d-h).The emergence and sudden increase in the predominance of the Lambda variant around week 12 (cf.Fig. 1g) is unlikely to be due solely to community transmission.As Lambda cases were zero or extremely low, this increase can be explained by an abrupt influx of cases (Supplementary Fig. S2d), which acted as a seed for community transmission.

Mutational load of the Spike gene correlates with variant transmissibility
Although SARS-CoV-2 variants share an evolutionary history, we observe a broad dispersion in the number of mutations (i.e., TML) even within lineages.On the other hand, while some samples of different lineages seem to have the same absolute number of mutations (seen, e.g., when drawing vertical cuts in Fig. 2a), they have different mutational profiles and, therefore, are classified in different clades according to the PANGOLIN criteria.These differences, which are indistinguishable when analysing the total number of mutations, might become evident when studying the relative enrichment in mutations of different regions of the genome (i.e., the nTML).
We computed the nTML for both the whole genome and solely for the Spike gene for Chile's predominant circulating variants (cf. to Fig. 2b), using as the reference the SARS-CoV-2 Wuhan-Hu-1 isolate (Accession: NC_045512.2).Observed mutations were typically missense, i.e., cause an observable change in the generated amino acid sequence, and have been reported to also impact the function of certain translated proteins in SARS-CoV-2 62 .We partially eliminate the codependency between nTML in Spike and the whole genome by subtracting the number of mutations in Spike from those in the whole genome.We observed a statistically significant enrichment in mutations in the Spike gene in all lineages.Among them, the Gamma VoC had the highest number of mutations in the Spike gene, followed by Alpha, Lambda, B.1.1.348,and finally B.1.1 with the lowest nTML (Fig. 2b).The Spike gene showed a marked dispersion in the nTML in all samples compared to the whole genome.There were no relevant temporal variations in nTML except for lineage B.1.1,where the mean nTML in Spike seems to increase at the end of the analysed period (Fig. 2c).However, this observation can be an artefact induced by the low number of samples found for this lineage.indicates that the Spike gene is enriched in mutations compared to the entire genome for all analysed variants.The apparent discreteness of the Spike nTML traces is due to the shorter gene length.The white points denote the median, black boxes denote the interquartile ranges, and whiskers (thin black lines) extend until at most 1.5 times the length of the interquartile range, and dot opacity denotes the time when samples were collected (light → old, dark → recent).Significance levels were determined with an u-test, see Supplementary Table S2).c.The most predominant variants do not show a considerable drift in their average nTML over time.Dotted lines account for weeks when the variants were not observed.d.There is a marked and significant positive correlation between nTML in Spike and the variants' relative transmissibility (median r = 0.923, p-value = 0.025).Vertical error bars are those reported in Figure 1, asterisks denote median values, and horizontal error bars were estimated through bootstrapping.The main difference between variants was the degree of mutation enrichment in the Spike gene, quantified by their median nTML.Furthermore, we found a marked linear correlation between the nTML in Spike and the relative transmissibility of the most predominant lineages (piece-wise linear correlation r = 0.923 , p-value=0.025,Fig. 2d).

Local samples differ from reference genomes
Chilean samples of different variants systematically exhibit mutation patterns not present in the reference lineages, i.e., they drift from the minimal list of defining mutations presented in Supplementary Table S1.We studied the mutational profile of Chilean samples categorised among the five lineages studied here and calculated the frequency of non-defining mutations.Table 1 summarises the results after filtering for the mutations present in at least 25% of the samples detected in the time frame analysed.
The mutations found in Chilean samples are not limited to the Spike gene or structural domains.All variants share the P322L mutation in the RdRp complex (NSP12), which is responsible for replicating and transcribing the virus genome 63 .Closely related lineages tend to accumulate similar non-defining mutational patterns, such as mutations F106F and L357F in NSP3, R203R (N), and V149F in (NSP6), which were frequently observed among both B.1.1 and B.1.1.348samples.B.1.1 samples also accumulated the A262S and G1167A mutations in the Spike protein, which have been reported to significantly increase the virus's infectivity and have a synergistic effect when occurring simultaneously 64 .
On the other hand, lineages start accumulating mutations that become defining for the more evolved variant.For example, the NSP6_S106 mutation, detected among all lineages except for B.1.1.348,and especially prevalent for the VOI/VOC, became defining for the Omicron variant 65

Temporal drift of variants' mutational signatures
We analysed the temporal trends for the frequency of non-defining mutations observed among the samples for each lineage, selecting only those mutations that present the most considerable variance.Samples belonging to B.1.1 and B.1.1.348lineages (Fig. 3a,b) presented the highest variability in their mutational profile.However, their occurrence was less frequent than the other lineages (especially for B.1.1),thus the subsampling-induced noise can explain part of the variability.
Lineages with the highest relative reproduction number and nTML, i.e., Gamma, Lambda, and Alpha (cf. to Fig. 2), persisted throughout vaccine roll-out as samples collected in this period were still identified as such.However, we observed specific changes in these variants' mutational signatures (and frequencies) over time.In particular, we can observe that the frequency of some mutations in the analysed samples drastically reduced as vaccination progressed.For example, the P411P (NSP12 gene), S36S (NSP2 gene), F106F and F1089F (NSP3 gene) mutations in Alpha (Fig. 3c) and the D156D (NSP1 gene), Y31Y (NSP9 gene), D139D (NSP12 gene), R203R (N gene) and D10D, P1200P, V1298V (NSP3 gene) in Gamma variants (Fig. 3d) were not frequently observed after reaching the ≈ 20% fully vaccinated population milestone.On the other hand, mutations in the B.1.1 (Fig. 3a) variant are highly variable, which could be a subsampling artefact, and the mutational profile of the Lambda variant does not significantly change over time or with the progress of vaccination (Fig. 3e).

Discussion
In this work, we quantified the relative transmissibility of Chile's predominant SARS-CoV-2 variants using a Bayesian model for disease spread tailored for conditions with limited genomic surveillance.The time frame of our analysis is limited to January and June 2021, as the representativeness of the sampling protocol was compromised after the importation of the Delta VoC as samples suspected to be Delta were prioritised for sequencing 11,15 .
We estimate the effective reproduction number ( R eff,t ) and thereby the relative transmissibility of all variants relative to the Alpha VoC.Due to the effective nature of the reproduction number inferred, an increase in variantspecific reproduction numbers is not necessarily an increase in the variant's base transmissibility but could reflect the particularities of the population where it spreads, e.g., overall immunity levels.We found that the relative transmissibility of the Gamma VoC compared to Alpha was f Gamma = 1.16 (95% CI: [1.11, 1.21]), which is in good agreement with the literature.For example, a contemporary study using pooled estimates across several countries reports f Gamma = 1.1 (95% CI: [1.03, 1.17]) 67 .Another study reports f Gamma = 1.09 (95% CI: [0.82, 1.44]) and f Gamma = 1.09 (95% CI: [0.96, 1.25]) using respectively 2.1 and 6.4 million of sequences worldwide 10,68 .(Note that the results in 10,68 are relative transmissibility and 95% confidence intervals with respect to the wild type (WT) lineage ( R/R WT 95% CI [lb,ub]), so the relative transmissibility between variants and their 95% CI were estimated as R1/R2 [lb1/ub2, ub1/lb2]).Our estimates reflect better the situation observed in Manaus, Brazil, for P.1, with peak transmissibility at reproduction numbers close to 5 60 .
Our findings for the relative transmissibility of the Lambda VoI, f Lambda = 1.05 (95% CI: [1.01, 1.14]), are also within the ranges reported in 10,68 , where it was estimated as f Lambda = 1.05 (95% CI: [0.8, 1.39]) and f Lambda = 1.12 (95% CI: [1.03, 1.21]) using respectively 2.1 and 6.4 million of sequences worldwide 10,68 .Although in Perú, a highly affected neighbour country, the transmissibility of Lambda was not quantified, it has been shown that it replaced other circulating variants swiftly [69][70][71] , thus arguing in favour of a higher value of f Lambda .However, in Chile, the Lambda variant did not fully replace the Gamma VoC in the timeframe analysed, being consistent with our finding that f Lambda < f Gamma in these regional settings.As time evolved, the partial immune escape of Lambda helped replace Gamma in some regions and thus yielded a higher f Lambda , as reported in the latest version of 10 .
The mutational characterisation presented in this work provides valuable preliminary insights into the genomic factors associated with the transmissibility of SARS-CoV-2 variants.In particular, we found a statistically significant enrichment of mutations in the Spike gene compared to the rest of the genome.Moreover, this enrichment in the Spike protein exhibited a strong positive correlation with the transmissibility of the analysed variants.This enrichment can be attributed to the critical role of the Spike protein, which facilitates the virus's entry into host cells 72 .Interestingly, we found that the Lambda variant has a lower nTML in the Spike gene than the Alpha variant, yet it exhibits higher relative transmissibility.This may be due to the presence of the L452Q and F490S mutations, which have been identified as critical drivers of Lambda's spread in South America 73 .
Specific mutations in the Spike gene of Gamma and Lambda variants were crucial for the survival of these variants during vaccine roll-out.For the Gamma variant, Spike mutations have been associated with enhanced transmissibility (N501Y) and with partial immune escape (K417T and E484K) 74 .For the Lambda variant, Spike mutations L452Q, F490S and deletion 246-252 conferred partial immune escape against neutralising antibodies elicited by CoronaVac and a higher infectiousness than the Gamma variant 75 .Although all vaccines, and therefore vaccine-elicited antibodies, are targeted towards the SARS-CoV-2 Spike protein, mutational data suggests that evolutionary pressure was also exerted on other viral genes.This fact becomes evident after reaching the 20% vaccination milestone, i.e., the eldest 20% of the population was vaccinated on weeks 13-14.Many earlier mutations in non-Spike genes disappeared during this transition, while others increased their frequency (cf.Supplementary Table S5 and Fig. 3).On the one hand, receding lineages (B.1.1 and B.1.1.348)tended to develop new Spike mutations before disappearing.For example, the S373P mutation in the RBD domain has been reported to partially escape immunity granted by mRNA vaccines and decrease plasma therapy success 76 , while R346K conferred higher transmissibility 77 .On the other hand, thriving lineages (Gamma and Lambda) tended to conserve and fix pre-existing Spike mutations.All lineages, except for Lambda, consistently developed non-Spike mutations during vaccine roll-out.The remaining variants were probably selected through epistatic fitness of a restricted protein subgroup, particularly Spike (S) and the nucleocapsid (N) protein 78 .However, we cannot infer a causal relationship between extinction and the appearance of mutations with the vaccination process with the data we have: whether there is causality behind this correlation should be separately studied.
Our modelling approach enables quantifying the relative transmissibility of different variants spreading simultaneously in settings with limited GS.Overcoming subsampling, besides requiring modelling the possibility of imperfect sampling, comes at the cost of simplifying assumptions.We assume that the generation interval does not vary between variants, which is a common assumption in the field (used, e.g., in 10 ).It is known that ignoring potential differences in the generation interval of SARS-CoV-2 variants might affect the estimate of their relative transmissibility 79 .However, these are minimal when R eff ≈ 1 .Further, serial intervals are not drastically different between these VoCs (which is not necessarily the case for Delta and Omicron VoCs, 80 ), and the credible intervals in their estimations overlap 81 -further justifying our choice of timeframe of analysis.Besides, we assumed that the influx of infections (and thereby, of variants) was proportional to the COVID-19 incidence in neighbouring countries and evenly distributed across all tracked variants.Currently, the influx corresponds to a tiny percentage of the total cases ( ≤ 5% , cf.Fig. 1b and Fig. S2a-e).However, more exact modelling would be required when neighbouring countries have considerably more cases than the country of study, as the influx can considerably affect community spread.Unfortunately, GS in neighbouring countries was not representative enough to allow us to incorporate it into our workflow (see, e.g., 24 ).More details regarding model robustness are presented in the Supplementary Materials.
Quantifying the transmissibility of emerging public health threats and understanding the mechanisms behind them is crucial for guiding effective control and prevention strategies for emerging threats and diseases with nontrivial endemic patterns 82,83 .The methodology proposed in this work can promptly quantify the relative transmissibility of viral variants even in situations with limited GS.Besides providing timely insights on the regional characteristics of the spread of Gamma, Lambda and other SARS-CoV-2 variants, our study provides a tool for countries with limited capacity for GS to maximise the information they extract.With this methodology and ensuring a representative sampling (following, e.g., 11 ), we demonstrate the benefits of running GS programmes and their crucial role in public health.

Nucleic acid extraction and amplification
Nasopharyngeal samples, previously confirmed as positive for SARS-CoV-2, were used for total nucleic acid extraction using the automated system Zybio EXM 6000.Reverse transcription for cDNA synthesis was performed with SuperScript III One-Step RT-qPCR System with Platinum Taq Kit, RNase OUT (Invitrogen) with 2 mM random primers and 4.5 µ M DTT at 55 o C for 60 min.cDNA was amplified based on COVID-19 ARTIC Illumina Library Construction and Sequencing Protocol V.3 (Farr, 2020), generating two pools with 400 pb length amplicons covering the whole viral genome.

Library preparation
DNA fragments from each pool were mixed together and the library was prepared with Illumina DNA PREP kit (Illumina, San Diego, CA, USA), purified using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA) and quantified by Victor Nivo Fluorimeter (Perkin Elmer) using Quant-it dsDNA HS Assay Kit (Invitrogen).DNA libraries were sequenced in a MiSeq (Illumina) using a 300 cycles kit.Around 0.3 GB of data was obtained for each sample.We built our model on top of our existing spreading dynamic model 54 to assess the relative transmissibility of the different variants in Chile.Given additional data, this model can be easily adapted for other countries or time frames.We simulated the spread of each variant independently whereby the susceptible pool S was shared across the different variants.For each variant v we computed the number of newly exposed E v iteratively given a prior distribution E v,0 and the generation interval distribution g with hyperprior m.This follows the work of [50][51][52] .To account for non-pharmaceutical interventions or other measures against the spread we introduced the timedependent effective reproduction number R base,t , which is allowed a change every 14 days relative to the previous reproduction number.
For each variant v the effective reproduction number was modulated by the time-invariant factor f v , called relative reproduction number in the text, such that the effective reproduction number of variant v is R eff,v,t = f v • R base,t S t N .We fixed f v = 1 for the Alpha variant.Additionally, to account for cases induced by travel we also add a small random influx v for each variant v which was scaled by the reported case numbers in the neighbouring countries M t (we used Argentina, Peru and Brazil).In discrete form the spreading dynamics in our model read as: Where N is the population size of our considered country (Chile).The prior is a little longer than the estimates of the generation interval of the Delta variant 84,85 , but shorter than the estimated serial interval of the original strain.The susceptible pool gets initialised with the population size.The prior distributions for the initial new cases of each variant E v,0 are essentially a flat prior (as described in 54 ).
The time-invariant contribution factors f v were set to the same value for each variant to incorporate no prior knowledge about a specific variant's contribution.Further, we choose a median of one as this is used as a multiplicative factor, this prior can be seen as relatively uninformative.
In addition to the five variants mentioned in the main text, we also include in our model the share of sequenced cases not categorised into these five variants ( f others ).In contrary to the other five main variants, the relative reproduction number of these other variants is allowed to vary over time (described later).The external input v,t w was modelled in a weekly fashion, indexed by t w , to decrease the number of variables to be estimated.Here we choose a small contribution for each variant as we expect influx to be less predominant than in-country infections, we assume 0.0005 %.
Let y v,t be the measured number of samples successfully sequenced (from samples having a positive RT-qPCR test), corresponding to variant v. Let n t be the total number of sequenced samples and τ v,t the inferred relative case numbers of the variant v at time t compared to the total non-variant case numbers.If we model the number of samples y v,t corresponding to a variant v as a multinomial random variable, and assuming that samples col- lected for sequencing are independent, we can build the multinomial likelihood function for our model with our real-world data y and n and the fraction of variant τ from the model: The fraction τ l,t is obtained from the model by the fraction between daily cases of a variant v and total daily cases.However in our model, we do not use this multinomial likelihood function but instead parameterise our model using the conjugate distribution, the Dirichlet distribution.In theory, it is equivalent to using the multinomial www.nature.com/scientificreports/distribution.The advantage is that we can add a factor w that parameterises an eventual non-optimal sampling strategy, for example, samples that are not being perfectly randomised across the country but are correlated to some extent.This has mathematically the consequence that the measured fractions y n,t /n t are all reduced by a factor w. Thus, the resulting likelihood function is given by: To infer the slowly changing reproduction number we introduce sigmoidal change points relative to the previous reproduction number whereby the priors for the date of occurrence d of the change point c are set every 14 days.The transient length l such as the date d of each change point c is defined relatively flat to express our uncertainty in these values.
For the five variants that we focused on in the main text, R base,t is multiplied by a time-invariant relative reproduc- tion number f v .For the spread of the 'other variants' that we modelled separately, we multiplied this R base,t by a time-dependent f others (t) as the mixture of variants can slowly change over time.We assumed the this change is slower than the R base,t : In addition to the sequenced samples, we constrain our model using the publicly reported case numbers (in Chile) C t aggregated by the Johns Hopkins University 86 .We sum over the newly infected pools for all variants to obtain the total number of new infections E t = v E v,t .These are then delayed with the LogNormal kernel with mean delay D to account for a reporting delay and further modulated by a weekly absolute sinus function parameterised by an amplitude h w and an offset χ w .
(13) τ v,t ∼ Dirichlet α = w • y v,t n t + 1 with The likelihood given the reported case numbers C t is then modelled by a StudentT distribution and quantifies the similarity between the model outcome and the available real-world time series.The scale factor κ heuristically incorporates the measurement noise.
For a complete list of model parameters and priors see Supplementary Tables S3 and S4 respectively.
To estimate the parameters of the Bayesian model, we use Monte Carlo sampling.In this way, we also obtain credible intervals of the parameters and not only the maximal likelihood estimate.Specifically, the sampling was performed using PyMC3 87 .We use a NUTS sampler 88 , which is a Hamiltonian Monte-Carlo sampler.The chains are initialised randomly.We run 16 chains for 1200 tuning steps and sample for 1500 steps The maximum tree depth is set to 10 and the target acceptance ratio to 0.95.
To quantify whether the chain mixes well and the model is converging, we plot the values of the inferred relative transmissibility over time (Supplementary Fig. S3).All variants display a good mixing except Gamma (P.1).It shows a slightly bimodal behaviour.Therefore median and credible interval for the relative transmissibility of Gamma might be slightly biased, as we do not have the mathematical assurance that our model has converged for this variable.

Figure 1 .
Figure 1.Bayesian inference enables individual assessment of the contribution of different SARS-CoV-2 variants to the spread of COVID-19.(a) Throughout 2021, five SARS-CoV-2 variants were identified as predominant in Chile, two considered Variants of Concern (VoC) by the WHO (Alpha, and Gamma), one Variant of Interest (Lambda), and two other unflagged lineages (B.1.1 and B.1.1.348).The total black line also included other non-predominant variants.Assuming that the contribution of each variant to the spreading dynamics (a-c) is proportional to their share (i.e., the fraction they represent of the total samples, d-h), we quantified their transmissibility compared to the Alpha variant (i-m).The Lambda and Gamma variants showed a 1.05 (95% CI [1.01,1.14])and 1.16 (95% CI [1.11,1.21])fold higher reproduction number than the Alpha variant.Other variants had a comparatively lower influence on the spread.Shaded areas in the b-h panels account for the 95% credible intervals of the model fit.Complementary parameters and variables are summarised in Supplementary Fig. S1.

Figure 2 .
Figure 2. Predominant variants are enriched with mutations in the Spike gene.(a) The Nextclade-based (https:// clades.nexts train.org/ tree) phylogenetic tree of the SARS-CoV-2 variants isolated in Chile, visualised using Auspice online tool (https:// auspi ce.us/ ) based on n = 2650 SARS-CoV-2 cases.The sequences are placed on a global reference tree (grey brunches and nodes), and clades are assigned to the nearest neighbour, while the branches with coloured circles represent lineages from Chile.(b) The normalised Total Mutational Load (nTML)indicates that the Spike gene is enriched in mutations compared to the entire genome for all analysed variants.The apparent discreteness of the Spike nTML traces is due to the shorter gene length.The white points denote the median, black boxes denote the interquartile ranges, and whiskers (thin black lines) extend until at most 1.5 times the length of the interquartile range, and dot opacity denotes the time when samples were collected (light → old, dark → recent).Significance levels were determined with an u-test, see Supplementary TableS2).c.The most predominant variants do not show a considerable drift in their average nTML over time.Dotted lines account for weeks when the variants were not observed.d.There is a marked and significant positive correlation between nTML in Spike and the variants' relative transmissibility (median r = 0.923, p-value = 0.025).Vertical error bars are those reported in Figure1, asterisks denote median values, and horizontal error bars were estimated through bootstrapping. https://doi.org/10.1038/s41598-024-66885-2www.nature.com/scientificreports/

Figure 3 .
Figure 3. Signatures of the settlement, replacement, and selection of mutations in the different observed lineages of SARS-CoV-2.Throughout 2021, the set of mutations that are present in the analysed samples of the predominant lineages has changed.This temporal evolution of the mutational footprint of the lineages can be quantified by the proportion of the analysed samples which present a given mutation.We selected mutations with the largest temporal variability for each lineage, and we present their evolution as a heat map.(a-e) Evolution of the fraction of the samples presenting a given mutation for the B.1.1 (a), B.1.1.348(b), Alpha (c), Gamma (d), and Lambda (e) variants, respectively, with their number of observations.Triangle markers at the lower end of each heat map account for the progress in vaccination.

Table 1 .
Most predominant non-defining mutations in Chilean samples.